Collecting Phonetic Data on Endangered Languages
نویسنده
چکیده
Working with native speakers to find out about a language is one of the most rewarding and enjoyable activities in which linguists engage. Much of the crucial work to ensure a successful fieldwork experience is conducted in advance of actually collecting data in the field. Background work on the language should include locating all possible sources of data. Digital recorders and laptop computers provide enhanced opportunities for collecting accurate acoustic data. Acoustic and aerodynamic data can be supplemented with simple techniques for accessing articulatory data, such as palatography. Phonetic fieldwork is especially important given the endangered status of the majority of languages in the world.
منابع مشابه
Using automatic alignment to analyze endangered language data: testing the viability of untrained alignment.
While efforts to document endangered languages have steadily increased, the phonetic analysis of endangered language data remains a challenge. The transcription of large documentation corpora is, by itself, a tremendous feat. Yet, the process of segmentation remains a bottleneck for research with data of this kind. This paper examines whether a speech processing tool, forced alignment, can faci...
متن کاملPronunciation Lexicon Development for Under-Resourced Languages Using Automatically Derived Subword Units: A Case Study on Scottish Gaelic
Developing a phonetic lexicon for a language requires linguistic knowledge as well as human effort, which may not be available, particularly for under-resourced languages. To avoid the need for the linguistic knowledge, acoustic information can be used to automatically obtain the subword units and the associated pronunciations. Towards that, the present paper investigates the potential of a rec...
متن کاملمقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملPhonetic structures of Aleut
A detailed analysis of the phonetic structures of Aleut, a moribund language spoken in Alaska, shows how much general phonetic information can be gathered from the investigation of an endangered language. Aleut has an unusual distribution of consonants, with varying functional loads. There are no bilabial stops. Among alveolar, velar and uvular stops, VOT is shorter for alveolar than for velar ...
متن کاملLarge-Scale Text Collection for Unwritten Languages
Existing methods for collecting texts from endangered languages are not creating the quantity of data that is needed for corpus studies and natural language processing tasks. This is because the process of transcribing and translating from audio recordings is too onerous. A more effective method, we argue, is to involve local speakers in the field location, using an audio-only translation inter...
متن کامل